Hierarchical clustering


When using all the Essentia features, the dendrogram tends to be skewed and asymmetrical, with most of the branches clustering towards the left side. This suggests that the data is concentrated in a specific area, making it harder to assess the similarity between tracks. The resulting over-clustering can make it difficult to interpret the relationships between tracks effectively.

When filtering down to just the features arousal, instrumentalness, and tempo, the dendrogram becomes more symmetrical and clearer. With fewer and more relevant features, the clusters in the dendrogram appear more distinct and balanced, making it easier to differentiate between tracks that share stronger similarities. This reduced complexity allows for a clearer visualization of the data, which aligns better with the perceived musical traits of the tracks

Despite initially perceiving that the two tracks I focused sounded quite similar, the dendrogram shows that they are, in fact, quite distant from each other. The first track is positioned on the far left of the dendrogram, grouped closely with tracks like ties-o-2 and wednesday-w-2. In contrast, the second track is placed more towards the right, clustering with tracks like daniel-p-2 and sarya-h-2.

Heatmap


Track 1

When listening to ties-o-2 and wednesday-w-2, and comparing them to my first track, I initially noticed very little similarity, if not none at all. However, the heatmap reveals that these three tracks share relatively similar values in instrumentalness. Surprisingly, all three have negative values, which I didn’t expect, particularly for my own track. Despite hearing many instrumental sounds, including Vietnamese zither, drums, and other traditional instruments combined throughout the song, the system does not seem to highlight these aspects. This raises concerns about whether non-Western traditional instruments can be effectively captured and represented by the system.

Track 2

When comparing daniel-p-2 and sarya-h-2 with my second track, I recognize a similarity in their tempo, as all three tracks have a relatively slow tempo. Additionally, their song structures are quite simple, which I believe explains why their arousal and tempo values are closely grouped together in the heatmap.

Track 1+2

I am still quite concerned about the system’s ability to accurately extract instrumentalness, as my first track has a negative value, while the second track has a positive one. However, upon listening to both tracks, I noticed that while the second track features flute sounds, it seems to predominantly contain electronic sound effects, unlike track 1, which is more focused on the combination of various traditional instruments. This discrepancy raises questions about how the system differentiates between non-Western traditional instruments and electronic sounds, and whether it can fully capture the nuanced instrumental features present in each track.

Classifying

          Truth
Prediction AI Non-AI
    AI     36     17
    Non-AI 13     24


The mosaic on the left illustrates the performance of a classifier attempting to distinguish between AI-generated and non-AI-generated tracks. Using k-Nearest Neighbour classifier, the most important features for classifying tracks are: instrumentalness + danceability + tempo. These features give the highest scores for AI-AI (prediction-truth) and non-AI - non-AI (prediction-truth), among the various feature combinations tested. This suggests that these features could be crucial and valuable for identifying whether a track is generated by AI or not.

Histogram of class corpus’s tempo

Music in Advertising Videos and The Study about Vietnamese music

A study by two researchers from Hungary, Monica Coronel and Anna Irimiás, confirms that music plays an essential supporting role in “destination promotional videos” and “tourism marketing,” stimulating both cognitive and affective responses. Specifically, their research reveals that background music can capture attention, reflect a destination’s characteristics, target specific audiences, highlight cultural identity, elicit emotions, and create ambience.

These findings about the importance of music in tourism marketing led me to explore Vietnamese advertising music and compare it with global music trends. In particular, my research question focuses on:

“How does the musical style of Vietnamese advertising music compare to other music? Does it have distinct characteristics, or does it align with broader global trends?”

To represent Vietnamese advertising music, I selected two tracks suitable for advertising videos showcasing Vietnamese culture and nature. After experimenting with generative AI tools, I opted for royalty-free tracks from Pixabay and SoundCloud. I used keywords such as “Vietnam,” “folk instruments,” “adventurous music,” and “travel” on both platforms, and filtered for “bright” mood and “cinematic music” theme on Pixabay. I chose these tracks because they feature Vietnamese folk instruments—a key focus—and include a strong bass that enhances engagement and evokes emotions in listeners, aligning well with the commercial and storytelling purposes of advertising videos.

To support and contextualize the comparisons with other “global music trends”, I will analyze Vietnamese advertising music alongside three Western music styles observed in the class corpus: rock (lennart-p-2), blue jazz (gijs-s-2), and traditional jazz (jasper-v-1). These genres provide contrasting perspectives on harmony, loudness dynamics, timbre, and rhythmic structure, allowing me to assess whether Vietnamese advertising music exhibits distinctive characteristics or aligns with broader global trends.

What are the overall characteristics of these two Vietnamese background music tracks in terms of Essentia features


This interactive boxplot presents the distribution of various Essentia features extracted from the class corpus. The black points represent all tracks in the dataset, while my tracks are highlighted in pink for better visibility.

My tracks are scattered across different features, showing varying degrees of similarity and uniqueness compared to the “average” track in the corpus:

Key Takeaways

Based on the distribution of my tracks compared to the class corpus, the key insights are:

This visualization provides a clear comparison of how my tracks align with the broader dataset and which features distinguish them. It confirms that Essentia effectively identifies track characteristics and highlights both similarities and unique elements of my track.

Chromagram


The first chromagram reveals a dynamically structured piece that doesn’t settle on a single tonal center but rather employs a wide array of pitch classes throughout its duration

Chroma-based and Timbre-based Self-similarity Matrices


Chroma-based Self-Similarity Matrix

The block-like structures and distinct lines are more apparent, indicating sections of the track where harmonic repetition homogeneity occurs:

Timbre-based Self-Similarity Matrix

The block-like structures are less clear. Instead, the streaks are more blurred and evenly distributed, suggesting that there is variability in timbre throughout the track

Chordograms


These chordograms visualize the harmonic structure of Track 1 and 2, displaying the evolution of chords over time. The Y-axis represents different chords used in the track, including major (maj), minor (min), dominant 7th (7), and diminished chords, while the X-axis represents time in seconds. The color intensity indicates the activation strength or presence probability of each chord at any given moment, with bright yellow signifying strong chord presence and dark purple indicating weaker or less frequent occurrences

Track 1

Track 2

Keygrams


These keygrams exhibit a more ambiguous structure, with a diverse and less clearly defined focus on specific musical keys throughout the track.

(I plan to analyze this further in the future, as I find some aspects of it quite confusing at the moment :) )

Tempograms


Track 1

- Fourier-based tempogram:

- Cyclic tempogram: A simplified visualization by wrapping higher harmonics back into the fundamental range

Track 2

- Fourier-based tempogram:

- Cyclic tempogram:

-> Based solely on the tempogram analysis of these tracks, Vietnamese advertising background music appears characterized by clear, stable fundamental tempo structures, often accompanied by identifiable harmonic patterns. The rhythmic consistency observed suggests suitability for creating comfortable listening experiences while viewing nature and culture presented in video, essential in promotional contexts. While there are still visible changes presented in tempograms at certain time-points, these are not such huge changes and these changes seem to be mainly due to the changes in instrumentation when carefully listening to the tracks

(I still don’t know how to change the code in order to make all graphs have the same size, if you know, please help meeee! thank you in advance!)